Modeling Phone Duration of Lithuanian by Classification and Regression Trees, using Very Large Speech Corpus
نویسندگان
چکیده
Classification and regression tree approach was used in this research to model phone duration of Lithuanian. 300 thousand samples of vowels and 400 thousand samples of consonants extracted from VDU-AB20 corpus were used in experimental part of research. Set of 15 parameters characterizing phone and its context were selected for duration prediction. The most significant of them were: identifier (ID) of phone being predicted, adjacent phones IDs and number of phones in syllable. Models were built using two different data sets: one speaker and 20 speakers. The influence of cost complexity pruning and different values of pre pruning were investigated. Prediction by average leaf duration vs. prediction by median leaf duration was also compared. Investigation of most vivid errors was performed, speech rate normalization and trivial noise reduction were applied and influence on models evaluation parameters discussed. The achieved results, correlation 0.8 and 0.75 respectively for vowels and consonants, and RMSE of ∼ 18 ms are comparable with those reported for Check, Hindi and Telugu, Korean.
منابع مشابه
Towards Acoustic Modeling of Lithuanian Speech
In this paper we present experimental investigation of using various phone sets for acoustic modeling of Lithuanian speech applied to large vocabulary continuous speech recognition. Paper presents specifics of Lithuanian speech acoustics including accentuation, diphthongs, softening and assimilation of consonants. The speech recognition experiments use only acoustic model since effective langua...
متن کاملSegmental duration modeling in Turkish
Naturalness of synthetic speech highly depends on appropriate modeling of prosodic aspects. Mostly, three prosody components are modeled: segmental duration, pitch contour and intensity. In this study, we present our work on modeling segmental duration in Turkish using machinelearning algorithms, especially Classification and Regression Trees (CART). The models predict phone durations based on ...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملBuilding Medium-Vocabulary Isolated-Word Lithuanian HMM Speech Recognition System
In this paper, the opening work on the development of a Lithuanian HMM speech recognition system is described. The triphone single-Gaussian HMM speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using HTK toolkit. Hidden Markov model’s parameters were estimated from phone-level hand-annotated Lithuanian speech corpus. The system was evaluated on a speake...
متن کاملTree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems
This study describes the tree-based modeling of prosodic phrasing, pause duration between phrases and segmental duration for Korean TTS systems. We collected 400 sentences from various genres and built a corresponding speech corpus uttered by a professional female announcer. The phonemic and prosodic boundaries were manually marked on the recorded speech, and morphological analysis, grapheme-to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Informatica, Lith. Acad. Sci.
دوره 19 شماره
صفحات -
تاریخ انتشار 2008